An Empirical Study of Smoothing Techniques for LanguageModelingStanley

نویسندگان

  • Stanley F. Chen
  • Joshua Goodman
چکیده

We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mer-cer (1980), Katz (1987), and Church and Gale (1991). We investigate for the rst time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) aaect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques , one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which out-perform existing methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

به کارگیری بیز تجربی در تهیه نقشه جغرافیایی بروز بیماری سل در استان مازندران طی سال‌های 90-1384

Background and purpose: Due to the increasing information about illnesses and deaths, classified map is of appropriate methods for analyzing this type of data. Standardized infection rates are commonly used in disease mapping but had many defects. This study aimed to compare the Poisson regression models and empirical Bayes models to prepare geographical map of tuberculosis incidence in Mazanda...

متن کامل

A Smoothing Technique for the Minimum Norm Solution of Absolute Value Equation

One of the issues that has been considered by the researchers in terms of theory and practice is the problem of finding minimum norm solution. In fact, in general, absolute value equation may have infinitely many solutions. In such cases, the best and most natural choice is the solution with the minimum norm. In this paper, the minimum norm-1 solution of absolute value equation is investigated. ...

متن کامل

Evidence on Asset Sales and Income Management: Case of Iran

This study empirically examines whether managers manipulate reported income through the timing of sales of long-lived assets and investments. Several empirical implications of the income-smoothing and debt-equity hypothesis in the context of asset sales were tested. The findings are consistent with the timing of asset sales by managers so that the recognized accounting income from these sales s...

متن کامل

Prediction of global sea cucumber capture production based on the exponential smoothing and ARIMA models

Sea cucumber catch has followed “boom-and-bust” patterns over the period of 60 years from 1950-2010, and sea cucumber fisheries have had important ecological, economic and societal roles. However, sea cucumber fisheries have not been explored systematically, especially in terms of catch change trends. Sea cucumbers are relatively sedentary species. An attempt was made to explore whether the tim...

متن کامل

Least Squares Techniques for GPS Receivers Positioning Filter using Pseudo-range and Carrier Phase Measurements

In present study, using Least Squares (LS) method, we determine the position smoothing in GPS single-frequency receiver by means of pseudo-range and carrier phase measurements. The application of pseudo-range or carrier phase measurements in GPS receiver positioning separately can lead to defects. By means of pseudo-range data, we have position with less precision and more distortion. By use of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996